Skip to content

feat: Add new hpa metrics to prevent prometheus timeseries duplication #2614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

CountryTk
Copy link

@CountryTk CountryTk commented Feb 19, 2025

What this PR does / why we need it:

Added 4 new hpa metrics to prevent duplicated timeseries events like described in this issue: #2403

Added new metrics are:

  • kube_horizontalpodautoscaler_spec_target_container_metric
  • kube_horizontalpodautoscaler_spec_target_object_metric
  • kube_horizontalpodautoscaler_status_target_container_metric
  • kube_horizontalpodautoscaler_status_target_object_metric

How does this change affect the cardinality of KSM: (increases, decreases or does not change cardinality)
Cardinality is increased because of new metrics

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes # #2403

FYI: I've also tested this change in our prelive cluster and for us it fixed the issue

Copy link

linux-foundation-easycla bot commented Feb 19, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: CountryTk
Once this PR has been reviewed and has the lgtm label, please assign rexagod for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 19, 2025
@k8s-ci-robot
Copy link
Contributor

Welcome @CountryTk!

It looks like this is your first PR to kubernetes/kube-state-metrics 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kube-state-metrics has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 19, 2025
@richabanker
Copy link
Contributor

/triage accepted
@CatherineF-dev @rexagod could you take a look here if possible, thanks!

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 20, 2025
@CountryTk
Copy link
Author

Hey, could this please be reviewed @CatherineF-dev @rexagod

@CatherineF-dev
Copy link
Contributor

Ok! In reviewing

@CatherineF-dev
Copy link
Contributor

Overall LGTM. Two small comments.

@CountryTk
Copy link
Author

CountryTk commented Mar 4, 2025

Overall LGTM. Two small comments.

Thanks for the review, I've implemented your suggestions @CatherineF-dev

@CountryTk CountryTk requested a review from CatherineF-dev March 4, 2025 13:23
@rexagod
Copy link
Member

rexagod commented Mar 16, 2025

Thank you for the patch.

I believe this identifies a resource-agnostic pitfall where we loop over certain nested fields without including a primary key in the overall generated metrics' label-sets.

I'll take a closer look tomorrow but so far this lgtm.

Copy link
Member

@rexagod rexagod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, wouldn't the earlier kube_horizontalpodautoscaler_spec_target_metric still be there after this is in, and cause the error to still show up?

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Mar 18, 2025

Could you help enable allowed changes from maintainers?

I want to make a small change.

Or could you apply these small changes? 96d948a

@CountryTk
Copy link
Author

CountryTk commented Mar 18, 2025

Could you help enable allowed changes from maintainers?

I want to make a small change.

Or could you apply these small changes? 96d948a

image

For me it shows maintainer edit access is enabled.

Anyway, I've added your suggested changes in the latest commit.

Question, wouldn't the earlier kube_horizontalpodautoscaler_spec_target_metric still be there after this is in, and cause the error to still show up?

Nope because kube_horizontalpodautoscaler_spec_target_metric is only going to have PodsMetricSourceType, ResourceMetricSourceType and ExternalMetricSourceType now, which didn't cause those duplicated errors.

ContainerResourceMetricSourceType and ObjectMetricSourceType have separate functions with added labels to prevent duplicated data

kube_horizontalpodautoscaler_spec_target_metric{horizontalpodautoscaler="hpa1",metric_name="events",metric_target_type="average",namespace="ns1"} 30
kube_horizontalpodautoscaler_spec_target_metric{horizontalpodautoscaler="hpa1",metric_name="hits",metric_target_type="average",namespace="ns1"} 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These metrics are deleted. Could you double check?

Copy link
Author

@CountryTk CountryTk Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metric name "hits" and some others are now under their respective metric
hits is now under kube_horizontalpodautoscaler_spec_target_object_metric instead of kube_horizontalpodautoscaler_spec_target_metric

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a breaking change. Could we keep kube_horizontalpodautoscaler_spec_target_metric not changed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is a breaking change but I was under the impression that it's fine since these metrics are experimental.
#2403 (comment)

I like the second option since the metric is still experimental so we can make changes to it. It will surely break some users but we didn't provide guarantees for this metrics. So as long as we make a smooth transition it should be fine.
The benefit of this approach is that the metrics will be specialized to the metric type they are targeting, so the labels will make sense for it and we won't run in scenarios where we have labels that are not used by some types.

Copy link
Member

@rexagod rexagod Mar 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kube_horizontalpodautoscaler_spec_target_object_metric{target_name="",horizontalpodautoscaler="hpa1",metric_name="hits",metric_target_type="average",namespace="ns1"} 12

@dgrisonnet I'm not sure if this is what you meant in your comment (#2403 (comment)), can you PTAL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CatherineF-dev Could I get some clarification on this task? Is it blocked or can we merge it?

Copy link
Member

@dgrisonnet dgrisonnet Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rexagod yes that is similar to what I had in mind and we would do the same for container_resource_metrics, for resource_metrics, ... That said after reading my initial comment again, the idea of adding the fully qualified name of the object is wrong as it would make correlation more difficult.

I think we should have something like that:

kube_horizontalpodautoscaler_spec_target_object_metric{
  object_kind=""
  object_apiversion=""
  object_name=""
  horizontalpodautoscaler="hpa1",
  metric_name="hits",
  metric_target_type="average",
  namespace="ns1"
}

kube_horizontalpodautoscaler_spec_target_container_resource_metric{
  container_name=""
  horizontalpodautoscaler="hpa1",
  metric_name="hits",
  metric_target_type="average",
  namespace="ns1"
}

we could instead extend the existing metric and add target_name, target_kind and target_apiversion to it, but I feel like having dedicated metrics would be easier to extend and work with

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgrisonnet I'm a bit confused here, so do I delete the metrics I created and instead add target_name, target_kind and target_apiversion to the original metrics (kube_horizontalpodautoscaler_spec_target_metric and kube_horizontalpodautoscaler_status_target_metric) or do I leave it as it is right now because

having dedicated metrics would be easier to extend and work with

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what you did is in line with what I had in mind, I was just sharing more details about my reasoning with @rexagod

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgrisonnet it seems to me like this PR is in a limbo of some sort
Is there any chance this can get merged or

@CountryTk CountryTk requested a review from rexagod March 30, 2025 19:50
@rexagod
Copy link
Member

rexagod commented May 7, 2025

I'll bring this up in the call tomorrow.

@CountryTk
Copy link
Author

I'll bring this up in the call tomorrow.

Hey, any updates?

@CountryTk
Copy link
Author

Is there any chance we can get this merged?

@CountryTk
Copy link
Author

@rexagod Any possibility we can get this merged?

@dudicoco
Copy link

Hi @CountryTk,

Looks like this PR does not fix external metric types.
Currently when you have multiple external metrics kube-state-metrics cannot deduplicate them properly.
I believe the correct way to deduplicate them is according to the selector.matchLabels field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants